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Derivation  of  the  Born  Rule  from 
Operational  Assumptions 
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1  Introduction 

Whence  the  Born  rule?  It  is  fundamental  to  quantum  mechanics;  it  is  the  essen¬ 
tial  link  between  probability  and  a  formalism  which  is  otherwise  deterministic;  it 
encapsulates  the  measurement  postulates.  Gleason’s  theorem  [4]  is  mathemati¬ 
cally  informative,  but  its  premises  are  too  strong  to  have  any  direct  operational 
meaning:  here  the  Born  rule  is  derived  more  simply,  from  purely  operational 
assumptions. 

The  argument  we  shall  present  is  based  on  Deutsch’s  derivation  of  the  Born 
rule  from  decision  theory  [2].  The  latter  was  criticized  by  Barnurn  et  al  [1], 
but  their  objections  hinged  on  ambiguities  in  Deutsch’s  notation  that  have  re¬ 
cently  been  resolved  by  Wallace  [12];  here  we  follow  Wallace’s  formulation.  The 
argument  is  not  quite  the  same  as  Wallace’s,  however.  Wallace  draws  heavily 
on  the  Everett  interpretation,  as  well  as  on  decision  theory;  like  Deutsch,  he  is 
concerned  with  constraints  on  subjective  probability,  rather  than  any  objective 
counterpart  to  it.  In  contrast,  the  derivation  of  the  Born  rule  that  we  shall 
present  is  independent  of  decision  theory,  independent  of  the  interpretation  of 
probability,  and  independent  of  any  assumptions  about  the  measuring  process. 
As  such  it  applies  to  all  the  major  foundational  approaches  to  quantum  me¬ 
chanics. 

We  assume  the  conventional  scheme  for  the  description  of  experiments:  an 
initial  state,  measured  observable,  and  set  of  macroscopic  outcomes.  Given  a 
description  of  this  form,  we  assume  there  is  a  general  algorithm  for  the  expecta¬ 
tion  value  of  the  observable  outcomes  (the  Born  rule  is  such  an  algorithm) .  The 
argument  then  takes  the  following  form:  for  a  particular  class  of  experiments 
there  are  definite  rules  for  determining  such  descriptions,  based  on  simple  opera¬ 
tional  rules,  and  theoretical  assumptions  that  concern  only  the  state-preparation 
device,  not  the  measurement  device.  These  rules  imply  that  in  general  such  ex¬ 
periments  can  be  described  in  different  ways.  But  the  algorithm  we  are  looking 
for  concerns  the  expectation  value  of  the  observed  outcomes,  so  applied  to  these 
different  descriptions,  it  must  yield  the  same  expectation  value.  Constraints  of 
this  form  are  in  fact  sufficient  to  force  the  Born  rule.  If  there  is  to  be  such  an 
algorithm,  then  it  is  the  Born  rule. 

1Contact:  simon.saunders@linacre.ox.ac.uk,  and  http://users.ox.ac.uk/~ppox/ 
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2  Multiple-Channel  Experiments 

The  kinds  of  experiments  we  shall  consider  are  limited  in  the  following  respects: 
they  are  repeatable;  there  is  a  clear  distinction  between  the  state  preparation 
device  and  the  detection  and  registration  device;  and  -  this  the  most  important 
limitation  -  we  assume  that  for  a  given  state-preparation  device,  preparing  the 
system  to  be  measured  in  a  definite  initial  state,  the  state  can  be  resolved  into 
channels,  each  of  which  can  be  independently  blocked,  in  such  a  way  that  when 
only  one  channel  is  open  the  outcome  of  the  experiment  is  deterministic  -  in  the 
sense  that  if  there  is  any  outcome  at  all  (on  repetition  of  the  experiment)  it  is 
always  the  same  outcome.  We  further  suppose  that  for  every  outcome  there  is 
at  least  one  channel  for  which  it  is  deterministic,  and  -  in  order  to  associate  a 
definite  initial  state  with  a  particular  region  of  the  apparatus  -  we  suppose  that 
all  the  channels  are  recombined  prior  to  the  measurement  process  proper. 

For  an  example  of  such  an  experiment  that  measures  spin,  consider  a  neutron 
interferometer,  where  orthogonal  states  of  spin  (with  respect  to  a  given  axis) 
are  produced  by  a  beam-splitter,  each  propagating  along  different  arms  of 
the  interferometer,  before  being  recombined  prior  to  the  measurement  of  that 
component  of  spin.  For  an  example  that  measures  position,  consider  an  optical 
two-slit  experiment,  adapted  so  that  the  lensing  system  after  the  slits  first  brings 
the  light  into  coincidence,  but  then  focuses  it  on  detectors  in  such  a  way  that 
each  can  receive  light  from  only  one  of  the  slits.  It  is  not  too  hard  to  specify 
an  analogous  procedure  in  the  case  of  momentum;2  any  number  of  familiar 
experiments  can  be  converted  into  an  experiment  of  this  kind. 

We  introduce  the  following  notation.  Let  there  be  d  channels  in  all,  with 
D  <  d  possible  outcomes  Uj  €  U,  j  =  1,  These  outcomes  are  macroscopic 

events  (e.g.  positions  of  pointers).  Let  M  denote  the  experiment  that  is  per¬ 
formed  when  all  the  channels  are  open,  and  Mj...  k  =  1  ,...,d  the  (deterministic) 
experiment  that  is  performed  when  only  the  kth  channel  is  open.  Let  there  be 
identifiable  regions  ri,  r^,  ...  of  the  state-preparation  device  through  which  the 
system  to  be  measured  must  pass  (if  it  is  to  be  subsequently  detected  at  all 
-  regardless  of  which  channels  are  open).  Call  an  experiment  satisfying  these 
specifications  a  multiple- channel  experiment. 

One  could  go  further,  and  provide  operational  definitions  of  the  initial  states 
in  each  case,  but  we  are  looking  for  a  probability  algorithm  that  can  be  applied 
to  states  that  are  mathematically  defined  (so  any  operational  definition  of  the 
initial  state  would  eventually  have  to  be  converted  into  a  mathematical  one): 
we  may  as  well  work  with  the  mathematical  state  from  the  beginning. 

3  Models  of  Experiments 

Turn  now  to  the  schematic,  mathematical  descriptions  of  experiments.  Our 
assumptions  are  conventional:  we  suppose  that  an  experiment  is  designed  to 

2  The  conventional  method  for  preparing  a  beam  of  charged  partciles  of  definite  momentum 
(by  selecting  for  deflection  in  a  magnetic  field)  can  be  adapted  quite  simply. 
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measure  some  observable  X  on  a  complex  Hilbert  space  H ,  which  for  conve¬ 
nience  we  take  to  be  of  finite  dimensionality,  we  suppose  that  the  apparatus  is 
prepared  in  some  initial  state  ip,  normalized  to  one,4  and  that  on  measurement 
one  of  a  finite  number  of  microscopic  outcomes  Xk  €  Sp(X)  results,  k  =  1,  ...,d 
(we  allow  for  repetitions,  i.e.  for  some  j  ^  k  we  may  have  A j  =  A*,).  We  suppose 
that  these  microscopic  events  are  amplified  up  to  the  macroscopic  level  by  some 
physical  process  :  Sp(X)  —>  U,  yielding  one  or  other  of  the  D  possible  dis¬ 
played  outcomes  Uj  €  U.  We  suppose  the  latter  macroscopic  events  occur  with 
probabilities  pj,  j  =  1,  ..,D. 5 

We  take  it  that  the  details  of  the  detection  and  amplification  process  are 
what  are  disputed,  not  that  there  is  such  a  process,  nor  that  it  results  in  macro¬ 
scopic  outcomes  Uj.  The  probabilities  computed  from  records  of  repeated  trials 
concern  in  the  first  instance  these  registered,  macroscopic  outcomes,  not  the 
unobservable  microscopic  events  A k  (indeed,  on  some  approaches  to  founda¬ 
tions,  there  are  no  probabilistic  microscopic  events,  prior  to  amplification  up 
the  macroscopic  level).  To  keep  this  distinction  firmly  in  mind  -  and  the  distinc¬ 
tion  between  the  sets  U  and  R  -  we  shall  not  assume  (as  is  usual)  the  numer¬ 
ical  equality  of  fl(Afc)  with  A&;  we  do,  however,  assume  that  the  macroscopic 
outcomes  u3  £  U  are  physical  numerals,  so  that  addition  and  multiplication 
operations  can  be  defined  on  them.  For  convenience  we  assume  that  none  of 
these  numerals  is  the  zero. 

Call  the  triple  ( ip,X,Cl )  an  experimental  model ,  denote  g.  This  scheme 
extends  without  any  modification  to  experiments  where  there  are  inefficiencies 
in  the  detection  and  registration  devices,  so  long  as  they  are  the  same  for  every 
channel.  (A  more  sophisticated  scheme  will  be  needed  if  the  efficiencies  differ 
from  one  channel  to  the  next,  however;  we  neglect  this  complication  here.) 

This  scheme  applies  to  a  much  wider  variety  of  experiments  than  multiple- 
channel  experiments;  the  Born  rule  is  conventionally  stated  in  just  these  terms. 
We  shall  be  interested  in  algorithms  that  assign  real  numbers  to  experimental 
models,  interpreted  as  expectation  values,  i.e.  weighted  averages  of  the  quanti¬ 
ties  Uj,  with  weights  given  by  the  probabilities  pj  of  each  Uj,  j  =  1  We 

are  therefore  looking  for  a  map  V  :  g  — >  R  of  the  form: 

D  D 

v[ip,x,n]  =  Y^PjUj,  =  L  fi) 

3= 1  i= 1 

If  D  =  d  we  can  write  the  Uj's  directly  in  terms  of  the  Ji(Afc)’s.  Otherwise 
define  =  {k  :  fi(Afc)  =  Uj },  j  =  1,  and  choose  any  real  numbers 

3 It  would  be  just  as  easy  to  work  with  Hilbert  spaces  of  countably  inifnite  dimension, 
and  restrict  instead  the  observables  to  self-adjoint  operators  with  purely  point  spectra.  (The 
difficulty  with  observables  with  continuous  spectra  is  purely  technical,  however.) 

4  Later  on  we  shall  consider  the  consequences  of  relaxing  the  normalization  condition  (cor¬ 
respondingly,  we  use  the  term  “state”  loosely,  to  mean  any  Hilbert  space  vector  defined  up  to 
phase) . 

5  In  the  case  of  the  Everett  interpretation,  we  say  rather  that  all  of  the  macroscopic  out¬ 
comes  result,  but  that  each  of  them  is  in  a  different  branch  (with  a  given  amplitude).  (We 
will  consider  the  interpretation  of  probabilty  in  the  Everett  interpretation  in  due  course.) 


3 


wk  G  [0, 1],  k  =  1, d  such  that  Y^,ke\ wk  =  Pj.  From  Eq.(l)  we  obtain: 

m  ^  ^  Wk^{  a&),  J2wk  =  l.  (2) 

k- 1  fe=l 

Conversely,  given  any  d  real  numbers  wife  €  [0, 1]  satisfying  Eq.(2),  define  the  D 
numbers  pj  =  Y^keX-1^  )  Wk i  from  Eq.(2)  we  obtain  Eq.(l). 

In  what  follows,  we  assume  the  existence  of  probabilities  pj  satisfying  Eq.(l), 
and  therefore  that  there  are  real  numbers  wk  satisfying  Eq.(2).  The  latter  will 
prove  more  convenient  for  calculations. 

4  The  Consistency  Condition 

Our  general  strategy  is  as  follows.  In  the  special  case  of  multiple- channel  exper¬ 
iments,  there  are  clear  criteria  for  when  an  experiment  is  to  be  assigned  a  given 
model.  There  follows  an  important  constraint  on  V:  for  if  M  is  assigned  two 
distinct  models  g ,  g' ,  and  if  there  is  to  be  any  general  algorithm  V  :  g  — »  i?,  then 
the  expectation  values  it  assigns  to  these  two  models  had  better  agree,  i.e.  V  (g) 
=  V(g’).  We  view  this  as  a  consistency  condition  on  V.  Failing  this  condition, 
expectation  values  of  models  could  have  no  unequivocal  experimental  meaning. 
The  probabilistic  outcome  events  uk  G  U  that  we  are  talking  of  are  all  observ¬ 
able;  it  is  the  mean  values  of  these  that  the  quantities  V (g)  concern;  if  one  and 
the  same  mean  value  is  matched  to  two  expectation  values,  V(g)  ^  V(g'),  then 
either  the  experiment  cannot  be  modelled  by  g  and  g' ,  or  there  is  no  algorithm, 
V  for  mapping  models  to  expectation  values. 

That  a  condition  of  this  kind  played  a  tacit  role  in  Deutsch’s  derivation 
was  recognized  by  Wallace;  it  was  used  explicitly  in  Wallace’s  deduction  [12]  of 
the  Born  rule,  although  there  it  was  cast  in  a  slightly  different  form,  and  the 
conditions  for  its  use  were  stated  in  terms  of  the  Everett  theory  of  measurement 
(including  the  theory  of  the  detection  and  registration  process).  Here  we  make 
do  with  operational  criteria,  and  with  assumptions  about  the  behavior  of  the 
state  prior  to  any  detection  events;  we  suppose  that  this  prior  evolution  of  the 
state  is  purely  deterministic,  and  governed  by  the  unitary  formalism  of  quantum 
mechanics. 

Consider  a  multiple-channel  experiment  M.  By  assumption,  there  are  d  de¬ 
terministic  experiments  Mk,  k  =  1  ,...,d  that  can  also  be  performed  with  this 
apparatus,  on  blocking  every  channel  save  the  fcth,  each  yielding  one  of  the  D 
macroscopic  outcomes  Uj  G  U.  Given  that  the  initial  state  in  region  r  for  Mk 
is  pk,  it  is  clear  enough,  on  operational  grounds,  as  to  what  can  be  counted 
as  a  model  for  this  experiment:  the  experiment  measures  any  X  such  that 
Xipk  =  A ktpk,  for  any  Xk  and  any  H  such  that  fi( Xk)  G  U  is  the  outcome  of  Mk. 

6  Of  course  in  its  initial  phases  the  process  of  state  preparation  will  involve  probabilistic 
events,  if  only  in  collimating  particles  produced  from  the  source,  or  in  blocking  particular 
channels.  But  it  does  not  matter  what  these  probabilities  are;  all  that  matters  is  that  if  a 
particle  is  located  in  a  given  region  of  the  apparatus,  then  it  is  in  a  definite  state,  and  unitarily 
develops  in  a  definite  way  (prior  to  any  detection  or  registration  process). 
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Now  consider  the  indeterministic  experiment  M  with  every  channel  open. 
We  suppose  that  the  state  of  M  at  r  is  ip  =  J2k=  1  ck‘Pk>  then  the  observable 
measured  is  any  X  such  that  Xipk  =  XkTk  for  k  =  1, d,  and  any  51  such  that 
51  (A k)  G  U  is  the  outcome  of  each  Mk. 

Let  us  state  this  as  a  definition: 

Definition  1  Let  M  have  d  channels  and  D  outcomes.  Then  M  realizes  (ip,X,Ll 
if  and  only  if 

(i)  for  some  region  r  and  orthogonal  states  Tk  the  state  of  Mk  in 

r,  k  =  1, d>  D,  and  ip  =  X)fc=i  ckTk  the  state  of  M  in  r, 

(ii) XiPk  =  xkTk,  k  =  1, d, 

(Hi)  Sl(Afc)  is  the  outcome  of  Mk,  k  =  1  ,...,d. 

The  definition  applies  equally  to  a  deterministic  experiment  (the  limiting  case 
in  which  d  =  D  =  1).  Bearing  in  mind  that  from  our  definition  of  multiple- 
channel  experiments,  for  each  Uj  €  U,  there  is  at  least  one  Mk  for  which  Uj  is 
deterministic,  it  follows  from  (ii),  (iii)  that  X  has  at  least  D  distinct  eigenvalues. 

Why  is  it  right  to  model  experiments  in  this  way  and  not  some  other?  The 
deterministic  case  speaks  for  itself;  in  the  indeterministic  case,  the  short  an¬ 
swer  is  that  it  is  underwritten  by  the  linearity  of  the  equations  of  motion.  An 
apparatus  that  deterministically  measures  each  eigenvalue  A k  of  X,  when  the 
state  in  a  given  region  of  the  apparatus  is  < pk,  will  indeterministically  measure 
the  eigenvalues  A k  of  X1  when  the  state  in  that  region  is  in  a  superposition  of 
the  <Pks-  This  principle  is  implicit  in  standard  laboratory  procedures;  this  is 
how  measuring  devices  are  standardly  calibrated,  and  how  their  functioning  is 
checked. 

The  consistency  condition  now  reads: 

Definition  2  V  is  consistent  if  and  only  if  V(g)  =  V(g')  whenever  g  and  g' 
can  be  realized  by  the  same  experiment. 

In  the  deterministic  case  evidently: 

V[<pk,  AfcP^fc,51]  =  fl(Afc).  (3) 

We  will  show  that  if  \ip\  =  1  and  V  is  consistent,  with  (, )  the  inner  product  on 
H1  then' 

V[iP,X,Q]  =  (iP,n(X)ip)  (4) 

Eq.(4)  is  the  Born  rule. 

We  begin  with  some  simple  consequences  of  the  consistency  condition.  The 
Born  rule  is  then  derived  in  stages:  first  for  equal  norms  in  the  simplest  possible 

7 Whilst  f2(X)  makes  no  sense  as  an  operator  (as  the  values  of  Q,  are  physical  numerals 
like  pointer-positions,  not  real  numbers)  we  are  assuming  that  arithmetic  operations  can  be 
defined  for  the  f2(Afc)’s;  define  <  ip,Q(Xji)Plpkip  >  =  <  -0,  ip  >  accordingly,  and 

extend  by  linearity. 
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case  of  a  spin  half  system;  then  for  the  general  case  of  equal  norms;  and  then 
for  rational  norms.  The  general  case  of  irrational  norms  is  handled  by  a  simple 
continuity  condition.  As  promised,  we  shall  also  derive  a  probability  rule  for 
initial  states  normalized  to  arbitrary  finite  numbers. 


5  Consequences  of  the  Consistency  Condition 

We  prove  four  general  constraints  on  V  that  follow  from  consistency.  (Eqs.(5)- 
(8)  may  be  found  in  Wallace  [12],  derived  on  somewhat  different  assumptions.) 
In  each  case  an  equality  is  derived  from  the  fact  that  a  single  experiment 
realizes  two  different  models:  by  consistency,  each  must  be  assigned  the  same 
expectation  value. 

We  assume  it  is  not  in  doubt  that  there  do  exist  such  experiments,  in  which 
the  initial  state  (prior  to  any  detection  or  amplification  process)  evolves  unitarily 
in  the  manner  stated. 

Lemma  3  Let  V  be  consistent.  It  follows 

(i)  for  invertible  /  :  R  — >  R: 


V{iP,X,Q]  =  V[il;,f(X),Qof-1].  (5) 

(ii)  For  orthogonal  projectors  {Pk},  k  =  1, ..,  d,  such  that  PkVj  = 

d  d  d  d 

ck<Pki  EA^’fi]=n£  Ck'ftk')  ^  ^ kPk i  (6) 

k= 1  k= 1  k= 1  k= 1 

(iii)  For  Uq  :  ipk  ->  eldk(pk ,  k  =  1, d,  for  arbitrary  Ok  G  [0,  2n\  C  R 

d  d 

kPVk,n]  =  V[Ue^,^2xkPVk,n\.  (7) 

k=  1  k= 1 

(iv)  For  Un  :  <pk  —>  y?7r(fe),  where  n  is  any  permutation  of  <  1,  ...,d  > 

v[^,x,n}  =  v[u^,n-1(x),n\.  (8) 

Proof.  Let  g  =  (ip,X,Cl^  be  realized  by  M  with  d  channels.  Then  for  some 

region  n  the  state  of  Mk  is  ipk,  k  =  1, ..,  d,  that  of  M  is  1  and  there 
exist  (not  necessarily  distinct)  real  numbers  A such  that  Xipk  =  A kfkt 
f2({Afc})  =  U.  Since  for  invertible  /,  fl[/_1(/(^fc)]  =  ^(Afe),  f(X)ipk  =  f(Xk)ipk, 
M  realizes  f(X),  Ll  o  /_1^} ,  and  (i)  follows  from  consistency.  Further,  M  re¬ 
alizes  any  other  model  such  that  Yipk  =  A k(pk,  1  ^ kPk  is  such 

a  y,  so  (ii)  follows  from  consistency.  Suppose  now  that  if  evolves  unitarily  to 
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the  state  Ugip  in  region  r2.  Then  in  r2  the  state  of  each  M k  is  eiekipk:  and  since 
Pei»kipk  —  P(pk >  M  realizes  (Ugip,J2k- 1  AfcPVfe,Q^,  and  (iii)  follows  from  consis¬ 
tency.  Finally,  let  ip  subsequently  evolve  to  the  state  U^ip  in  region  r'3 .  Then  in 
r3  the  state  of  each  Mk  is  <pw(fe),  and  the  state  of  M  is  Ylk=  1  ck^P^{k)-  Without 
loss  of  generality,  we  may  write  X  as  Y^k=  1  W  PVit;  then  7r_1(X)  =  1  W 

PVjr(fc)  satisfies  7r_1(X)<pw(fe)  =  A h<Pn(k)i  so  realizes  (unip,  7r_1(Af),  and 
(iv)  follows  from  consistency  ■ 

Eqs.(5)-(8)  are  of  course  trivial  consequences  of  the  Born  rule,  Eq.(4).  Note 
further  that  in  each  case  the  observables  whose  expectation  values  are  identified 
commute  -  these  are  constraints  among  probability  assignments  to  projectors 
belonging  to  a  single  resolution  of  the  identity.  Finally,  note  that  the  normal¬ 
ization  of  the  initial  state  ip  played  no  role  in  the  proofs. 

6  Case  1:  The  Stern-Gerlach  Experiment  for 
Equal  Norms 

Consider  the  Stern-Gerlach  experiment  with  d  =  D  =  2.  Let  X  =  i P+  —  \P-  = 
az  (in  conventional  notation),  the  observable  for  the  ^-component  of  spin  with 
eigenstates  ip±,  and  let  ip  =  c+(p+  +  C-ip_.  Let  U v  interchange  <p+  and  ip_,  so 
U^azUpp1  =  —az.  From  Lemma  3(iv)  it  follows  that: 

V[c+ip+  +  c_(^_,az,fi]  =  V[c+tp_  +c-ip+,—dz,£l\.  (9) 

From  Eq.(9)  and  Lemma  3(i): 

V[c+<p+  +  c-ip_,az,  O]  =  V[c+ip_  +  C-(p+,  az,  fl  o  -I]  (10) 

(where  (O  o  —  I)(x)  =  fi(— x)).  From  Eq.(10),  in  the  special  case  that  |c+|2  = 
|c_|  ,  and  using  Lemma  3(iii)  to  compensate  for  any  differences  in  phase: 

V[c+V+  +  C_<£_,(72,  Q]  =  V[c+tp+  +  C_(p_,(Jz,  fl  o  -/].  (11) 

Consider  the  LHS  of  this  equality.  From  Eq.(2),  writing  w\  =  w,  w2  =  1  —  w, 
fl(±|)  =  fi(±)  -  so  that  fi(+)  results  with  probability  w,  and  0(— )  results  with 
probability  1  —  w)  -  we  obtain  the  expectation  value  x  =  wfl(+)  +  (1  —  w)£l(— ). 
But  by  similar  reasoning,  the  RHS  yields  wfi(-)  +  (1  —  u>)fi(- 1-)  =  —x  +  fi(+)  + 
fl(— ).  Equating  the  two,  x  =  |[f2(+)  +  fl(— )]. 

We  have  shown,  for  |c+|2  =  |c_|2  : 

V[c+ifi+  +  C-ip_,,az,£t]  =  ift(+)  +  ^fi(-)  (12) 

in  accordance  with  the  Born  rule.  Note  that  here  we  have  derived  an  expectation 
values  in  a  situation  (dimension  2)  where  Gleason’s  theorem  does  not  apply. 
(Note  that  the  normalization  of  the  initial  state  ip  is  again  irrelevant  to  the 
result.) 
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7  Case  2:  General  Superpositions  of  Equal  Norms 

Consider  an  arbitrary  observable  on  any  d—  dimensional  subspace  Hd  of  Hilbert 
space.  By  the  spectral  theorem,  we  may  write  X  =  Y2t=i  A/jPVfc,  for  some 
set  of  orthogonal  vectors  k  =  1  spanning  Hd,  where  there  may  be 

repetitions  among  the  Afc’s.  Let  ip  be  a  (not-necessarily  normalized)  vector  in 
Hd',  then  for  some  d-tuple  of  complex  numbers  <  C\,...,Cd  >,  ip  =  Efc=icfcW 
For  any  permutation  n,  we  have  from  Lemma  3(iv),  (i): 

d  d  d 

=  cXlCfe<fMfc)’7r”1(^)’n] =  yEcfc^(fe)’^’fio7ri-  (13) 

fe= i  ?== i  ?== i 

If  |cfc|2  =  ]cj|2,  j,k  =  1  ,  ...,d,  using  Lemma  3(iii)  as  before  to  adjust  for  any 
phase  differences 


V[ip,X,Q]  =  V[ip,X,Cl  ott].  (14) 

Let  <  w i,  ...,Wd  >  be  a  d-tuple  of  non-negative  real  numbers  satisfying  Eq.(2). 
From  Eq.(14): 

d  d 

^Vfcfi(Afc)  =  ^■u;feH(A7r(fe)).  (15) 

fc= i  fc= i 

Eq.(15)  holds  for  any  permutation;  let  tt  interchange  j  and  k,  and  otherwise  act 
as  the  identity.  There  follows 


WjCl(Xj)  +  WfcH(Afc)  =  ■u)fcfl(Aj)  +  Wjtt(  Afc).  (16) 

Conclude  that  if  Ll(Xj)  ^  fl(Afc)  then  Wk  =  Wj  (recall  that  by  convention  0  ^  U, 
so  H(Afe)  is  never  zero). 

If  D  =  d,  evidently  u>k  =  Wj  for  all  j,k  =  1, ...,  d.  Since  u’k  =  It  wk  = 
k  =  1,  ...,d.  Therefore 

^  1  d 

V[iP,X,n]  =  -[J2^(  Afc).  (17) 

u  fe=l 

If  not,  suppose  fl(Aj)  =  fl(Afc)  for  j,k  =  l,...,b  <  d.  (If  6  =  d  Eq.(17)  follows 
trivially.)  For  any  j,  k  such  that  b  <  j  <  d,  k  <  b,  fl( Xk)  ^  fl(A j),  from 
which  we  conclude  as  before  that  Wk  =  u>j.  Note  further  that  under  the  stated 
conditions,  1/d  =  |cfc]2(Ej=i  |cj|2)-1.  We  have  proved 

Theorem  4  Let  ip  =  Efc=i  CfcT’fc)  where  jc* j 2  =  | c ^  | 2  for  all  j,  k  =  1, ...,  d.  T/ien 
if  V  is  consistent 

d  d  d  I  |2 

Gyi  ^  Xk^lVk,  H]  =  ^  d  '  — fl(Afc). 

fe=i  fe=i  fe=i  z_vj= l  lcil 

Like  Lemma  3,  Theorem  4  is  independent  of  the  normalization  of  ip. 


(18) 


8  Case  3:  d=2  Normalized  Superpositions  with 
Rational  Norms 

The  idea  for  extending  these  methods  to  treat  the  case  of  unequal  but  rational 
norms  is  as  follows:  consider  an  experiment  in  which  the  initial  state  ip  evolves 
deterministically  so  that  each  component  ipk  entering  into  the  initial  superpo¬ 
sition  with  amplitude  ck  evolves  into  a  superposition  of  zk  orthogonal  states  of 
equal  norm  1  /y/zk,  such  that  \ck!  i/zU?  is  constant  for  all  k.  One  can  then  show 
that  the  experiment  has  a  model  in  which  the  initial  state  is  a  superposition  of 
states  of  equal  norms,  so  Theorem  4  can  be  applied.  (Evidently  for  this  to  work 
each  \ck\2  will  have  to  be  a  rational  number.) 

For  simplicity,  consider  first  the  case  d  =  2  for  real  amplitudes.  Let  ip  = 
•Jm+n^i  y /m+rt1^2’  w^ere  171  and  n  are  integers.  Let  X  =  X1  PVi  +  X2PV2.  We 

will  show  that  if  V  is  consistent,  V[ip,X,Q\  =  ^^Cl(Xi)  +  ^^12(A2).Let  the 
deterministic  experiments  of  M  be  Mi, M2,  with  registered  outcomes  12(Ai), 
fi(A2)  respectively.  Let  the  initial  states  of  M,  Mi,  M2  in  region  rq  be  ip, 

(Pi,tp2  respectively.  Then  M  realizes  g\  =  (ip,X ,12^  .  Now  let  ip  evolve  to 
Uip  in  region  r2,  where  Uyx  =  -±=  ET=i  Xfe,  Uy2  =  7=  Y2tZ+ 1  Xfe,  f«r  some 
orthogonal  set  of  vectors  {xfe},  k  =  1,  ...,m  +  n.  Denote  AiPjj  +  A 2Pqv  by 
Y.  Then  the  initial  state  of  Mj,  i  =  1,2  is  U(pt  in  r2,  whilst  that  of  M  is 
Ci U ip  1  +  c2U(p2  in  r2;  since  YUipi  =  \U^pi  it  follows  that  M  realizes  g2  = 
(Uip,  Y,  12^.  By  consistency,  V(gi)  =  V(g2)-  Now  define  Pi  =  Xi  EfcLi 

P2  =  A2  Efc=m+i  P\k ;  since  PkUtfij  =  SkjUffij,  k,j  =  1,2,  by  Lemma  3(ii)  it 
follows  V[Uip,Y,n]  =  V[Uip,XiPi  +  X2P2,n].  But  Uip  =  ^  ZtT  Xfe  ; 
applying  Theorem  4  for  d  =  m  +  n,  and  noting  that  D(Afe)  =  Ai  for  k  =  1, ...,  to, 
and  A2  otherwise,  the  result  follows. 

9  Case  4:  General  Superpositions  with  Rational 
Norms 

The  argument  just  given  assumed  ip  was  normalized  to  one.  The  standard 
rational  for  this  is  of  course  based  on  the  probabilistic  interpretation  of  the  state, 
and  hence,  at  least  tacitly,  on  the  Born  rule.  It  may  be  objected  that  we  are 
only  able  to  derive  the  dependence  of  the  expectation  value  on  the  squares  of  the 
norms  of  the  initial  state,  because  this  is  put  in  by  hand  from  the  beginning.  But 
this  suspicion  is  unfounded.  Suppose,  indeed,  only  that  =  ^.  As  before, 
define  Uipi  =  Efcli  Xfe,  Uip2  =  ^  YJktZ+i  Xfe-The  state  Uip  in  region  r2 
will  have  whatever  normalization  ip  had  in  rq;  the  states  Uipi  i  =  1,2  will  be 
eigenstates  of  Pi,  as  before;  Definition  1,2  will  apply  as  before.  Conclude  that  if 
V  is  consistent,  V[ip,  X,  12]  =  V[Uip,  XiPi  +  X2P2, 12],  as  before.  The  difference  is 
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that  now  Uif  =  -fe  £r=1  Xfe  +  ft  YltZ  Xk  =  ft  ££T  Xk  =  ft  TZZZ  Xk 
(adjusting  the  phases  of  c\  and  C2  ,  using  Lemma  3(ii),  as  required).  Evidently 
we  have  an  initial  state  which  is  a  superposition  of  n  +  m  components  of  equal 
norm,  m  of  which  yield  outcome  fl(Ai)  and  n  of  which  yield  outcome  fl(A2). 


Since 


n+ra 


I  ci  |  n 

|ci|2  +  |c2|2  ’  n+m 


|C2|" 

FTF+RF 


V[if,X1PVl  +A2Pv,3,fi] 


ci 


Cl 


C2l 


M\i)] 


C2 


Cl 


C2| 


:  [fi(A2)]. 


(19) 


Evidently  the  normalization  of  if  is  irrelevant. 

This  result  is  worth  proving  in  full  generality: 

Theorem  5  For  each  i,j  —  1, ...,  d  let  Ci  €  C  satisfy  |cj|  >  0,  G  Z .  Then 


d  d  d  I  |2 

^E ck<pk, E  =  y  JCfc  ,+(Afc).  (20) 

fc=i  fc=i  fc=i  Ej=i  lcjl 

Proof.  For  {ck}  as  stated,  there  exists  c  €  C ,  zk  €  Z,  9k  €  [0,27t],  k  =  1  ,..,d 
such  that  ck  =  ceL6k^/zk.  Let  mk,n  be  integers  such  that  zk  =  +  ,  k  =  l,...,d; 
let  {Xj},  j  =  1,  ...,s  be  an  orthonormal  basis  on  an  s— dimensional  subspace  of 
Hilbert  space  Hs,  where  s  =  YZ=  1  ms  (we  may  suppose  for  j  =  1  ,...,d,  Xj  = 
<Pj)-  Define  U  on  Hs  by  the  action  Uipk  =  Ejlml+i  Xj!  let  Pk  =  PfjVk, 

k  =  1  ,...,d.  Let  if  =  YZk=i  ckTk'i  let  M  realize  g±  =  (if,  Ylt=i  ^kP<pk  then 
for  some  region  ri ,  the  initial  state  of  M  is  if  and  the  state  of  each  Mk  is  ipk 
with  outcome  ft(Xk).  Let  the  state  of  M  at  r2  be  Uif;  then  M  also  realizes  g2  = 
(Uif,Y)t=i  AfcPfc,flV  and  by  consistency  V(g\)  =  V(g2).  But  by  construction 


d  d 

Uif  =  YckUVk  =  ^ 


k= 1 


k= 1 


Ck 

/mf 


mk+i 

E 

j=mfc  + 1 


ft  cei9k 

Xj  =  ft  - 


k= 1 


n 


mk+ 1 

■  E 

2=mfc  + 1 


Xi 


(21) 


(by  Lemma  3(H)).  The  result  follows  from  Theorem  4  (of  s  equiprohable  out¬ 
comes,  mk  have  outcome  Cl( Xk),  so  V{g)  =  -  £*,=1  mkLl(Xk).  But  mk/s  = 
mi)_1  =  zk(J2j=i  zj)~1  =  |cfe|2  Sf=i1cj|2)-1)  - 


Examination  of  the  proof  shows  that  the  dependence  of  probabilities  on 
the  modulus  square  of  the  expansion  coefficients  of  the  state  ultimately  de¬ 
rives  from  the  fact  that  we  are  concerned  with  unitary  evolutions  on  Hilbert 
space,  specifically  an  inner-product  space,  and  not  some  general  normed  lin¬ 
ear  topological  space.  A  general  class  of  norms  on  the  latter  is  of  the  form 

/  ,  \  i/p 

\\x  II  =  ELi  14  |p)  ,  1  <  p  <  00  (d  may  also  be  taken  as  infinite).  Such 

spaces  (P  spaces)  are  metric  spaces  and  can  be  completed  in  norm.  The  proof 
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as  we  have  developed  it  would  apply  equally  to  a  theory  of  unitary  (i.e.  invertible 
norm-preserving)  motions  on  such  a  space,  yielding  the  probability  rule 


a  a  a  \  \P 

cyi Cfc^fc’ yi ^kPtpkl£V\ = y.  ,  (22) 

fe=i  fc= 1  fe=i  Z^=i  lcjr 

(assuming  that  £  Z,  j,  k  =  1  But  the  only  space  of  this  form  which 

is  an  inner-product  space  is  p  =  2  (Hilbert  space). 


10  Case  5:  Arbitrary  States 

There  are  a  variety  of  possible  strategies  for  the  treatment  of  irrational  norms, 
but  the  one  that  is  most  natural,  given  that  we  are  making  use  of  operational 
criteria  for  the  interpretation  of  experiments,  is  to  weaken  these  criteria  in  the 
light  of  the  limitations  of  realistic  experiments.  In  practise,  one  would  not  ex¬ 
pect  precisely  the  same  state  to  be  prepared  on  each  run  of  the  experiment. 
Properly  speaking,  the  statistics  actually  obtained  will  be  those  for  an  ensem¬ 
ble  of  experiments;  correspondingly,  they  should  be  obtained  from  a  family  of 
models,  differing  slightly  in  their  initial  states.  We  should  therefore  speak  of 
approximate  models  (or  of  models  that  are  approximately  realized)  -  where  the 
differences  among  the  models  are  small. 

How  small  is  small?  What  is  the  topology  on  the  space  of  states?  The 
obvious  answer,  from  a  theoretical  point  of  view,  is  the  norm  topology.  We 
should  suppose  that  for  sufficiently  small  e,  so  long  as  \ip  —  ip'\  <  e,  then  if 

(ip,X,flj  is  an  approximate  model  for  M  then  so  is  (ip',X,  Indeed,  X 

and  H  will  likewise  be  subject  to  small  variations.  (Only  the  outcome  set  U 
can  be  regarded  as  precisely  specified,  insofar  as  outcomes  are  identified  with 
numerals.) 

But  now  it  is  clear  that  the  details  are  hardly  important;  any  algorithm  that 
applies  to  families  of  models  of  this  type,  yielding  expectation  values,  will  have 
to  be  continuous  in  the  norm  topology.  Given  that,  the  extension  of  Theorem 
5  to  the  irrational  case  is  trivial.  We  define: 

Definition  6  Let  g W  be  any  sequence  of  models  (i tp^l\X,fl^,  i  =  1,2,  ...such 
that  lim  \ip^  —  ip\  =  0.  Then  V  is  continuous  in  norm  if  lim  V(  g =  V(g). 

i — yoo  i — yoo 

We  may  finally  prove: 


Theorem  7  Let  V  be  consistent  and  continuous  in  norm.  Then  for  any  model 

U,x,n) 


v[ti>,x,n] 


{if,  ip) 


(23) 
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Proof.  It  is  enough  to  prove  that  any  realizable  model  satisfies  Eq.(22).  If 
realizable,  there  is  some  multiple- channel  experiment  M  with  d  channels  and 
D  outcomes  that  realizes  A,X,flY  Let  k  =  1  ,..,d  be  any  orthog¬ 


onal  family  of  vectors  such  that  Xipk  =  \kVk  (not  all  the  need  be  dis¬ 
tinct).  Without  loss  of  generality,  let  V’  =  Efc=i  X  —  J2k=  1  ^kPVk-  Let 
0  <  e  =  J2k= 1  \Ck\2  ancl  l^  (44 1  —  Cd  t>e  any  sequence  of  d-tuples  such  that 
e  —  Efc= l  lc!4  I2’  !%,'  C  Z,  lim  c^k  =  cfc  (such  a  sequence  can  always  be  found). 

Icfc  I2  i—> oo 

Let  =  Efc=icfeVfc,  5(i)  =  {ip{i),X,ny  By  Theorem  5,  V[^(<),  X,  0]  = 


^fc=1  EJd=ilcy,|2^^Afc')  Ej=i  kfi 


Tiin  Efe=i  ^(Afc)  )  ■  The  numer¬ 


al 


ator  is  (^^l\  Ylt=i  ^(Afc)-Py?fe)'0^ ^  (by  the  continuity  of  the  inner  product), 
i.e.  ;  since  the  denominator  is  bounded  below  by  e  >  0,  with 


lim  E7L1  1 4° |2  =  Ei=i  lcf|2;  and  since  lim  =  (tp,fl(X)tp) 

(again  by  the  continuity  of  the  inner  product),  the  result  follows  from  the  con¬ 
tinuity  of  V  ■ 

A  similar  proof  can  be  given  for  a  general  probability  rule  on  lp  spaces,  p  /  2 
(i.e.  Eq.(22),  for  arbitrary  complex  coefficients;  of  course  this  result  could  not 
be  expressed  as  in  Eq.(23),  using  an  inner  product). 

Is  a  continuity  assumption  permitted  in  the  present  context?  Gleason’s 
theorem  does  not  require  it;  if  one  is  going  to  do  better  than  Gleason’s  theorem, 
it  would  be  pleasant  to  derive  the  continuity  of  the  probability  measure,  rather 
than  to  assume  it.  But  from  an  operational  point  of  view  continuity  is  a  very 
natural  assumption:  no  algorithm  that  could  ever  be  used  is  going  to  distinguish 
between  states  that  differ  infinitesimally. 


11  A  Role  for  Decision  Theory 

Deutsch  [2]  took  a  rather  different  view:  he  was  at  pains  to  establish  the  Born 
rule  for  irrational  norms,  without  assuming  continuity.  His  method,  however, 
was  far  from  operational:  along  with  axioms  of  decision  theory,  he  assumed  that 
quantum  mechanics  is  true  (under  the  Everett  interpretation). 

A  hybrid  is  possible:  the  present  method  can  in  fact  be  supplemented  with 
axioms  of  decision  theory,  yielding  the  Born  rule  for  irrational  norms,  without 
any  continuity  assumption.  But  as  Wallace  [12]  makes  clear,  nothing  much 
hangs  on  this  question.  One  can  do  without  a  continuity  assumption,  but  there 
are  just  as  good  reasons  to  invoke  it  from  a  decision  theoretic  point  of  view 
as  from  an  operational  one.  In  neither  case  is  there  any  reason  to  distinguish 
between  states  that  differ  infinitesimally. 

Decision  theory  is  important  for  a  rather  different  reason:  it  is  because 
the  non-probabilistic  parts  of  decision  theory  (as  Deutsch  puts  it),  or  decision 
theory  in  the  face  of  uncertainty  (as  Wallace  puts  it)  can  provide  an  account  of 
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probability  in  terms  of  something  else.  This  matters  in  the  case  of  the  Everett 
interpretation;  according  to  many,  the  Everett  interpretation  has  no  place  for 
probability  [7];  given  Everett,  probability  cannot  be  taken  as  primitive. 

So  it  is  clear  why  Deutsch  took  the  more  austere  line:  if  Everett  is  to  be 
believed,  quantum  mechanics  is  purely  deterministic.  Deutsch  supposed  that 
the  fundamental  concept  (that  can  be  taken  as  primitive)  is  rather  the  value  or 
the  utility  that  an  agent  places  upon  a  model  -  that  V(g)  is  in  fact  a  utility. 
He  argued  that  experiments  should  be  thought  of  as  games;  for  each  registered 
outcome  in  U,  we  are  to  associate  some  utility,  fixed  in  advance.  So,  in  effect, 
the  mapping  :  A*,  — >  fl(Xk)  G  U  defines  the  payoff  for  the  outcome  A*,. 

Decision  theory  on  this  approach  has  a  substantial  role.  If  we  suppose  that 
the  utilities  of  a  rational  agent  are  ordered,  and  satisfy  very  general  assumptions 
(“axioms  of  rationality”),  a  representation  theorem  can  be  derived  [10]  which 
defines  subjective  probability  in  terms  of  the  ordering  of  an  agent’s  utilities.  In 
effect,  one  deduces  -  in  accordance  with  these  axioms  -  that  the  agent  acts  as 
if  she  places  such-and-such  subjective  probabilities  on  the  outcomes  of  various 
actions.8 

It  is  important  that  one  can  still  make  sense  of  uncertainty  in  this  context, 
as  Wallace  explains.  It  may  be  we  cannot  help  ourselves  to  probabilistic  ideas  ab 
initio,  but  that  does  not  mean  that  one  only  deals  with  certainties  -  that  games, 
in  some  sense,  have  only  a  single  payoff,  as  Deutsch  at  one  point  suggests  [2, 
p.3132-3.]  From  a  first-person  perspective,  one  does  not  know  what  outcome  of  a 
quantum  game  to  expect  to  observe  (there  is  certainly  no  first-person  perspective 
from  which  they  can  all  be  observed).  In  fact,  it  is  enough  that  -  in  the  face  of 
branching  -  a  rational  agent  expects  anything  at  all  (that  she  does  not  expect 
oblivion  [9]). 

On  this  line  of  thought,  the  proofs  of  the  Born  rule  just  presented  make 
an  illegitimate  assumption:  Eq.(l).  We  are  not  entitled  to  assume  that  the 
macroscopic  outcomes  Uj  €  U,  j  =  1,...,D  occur  with  probabilities  pj,  for 
they  all  occur;  so  neither  can  we  assume  there  are  non-negative  real  numbers, 
summing  to  one,  satisfying  Eq.(2).  But  the  proof  of  Theorem  4  (hence  5  and  7) 
depended  on  this  assumption.  Of  course  we  may,  with  Deutsch  and  Wallace, 
eventually  be  in  a  position  to  make  statements  about  the  subjective  probabilities 
of  branches,  but  if  so  such  statements  will  have  to  come  at  a  later  stage  -  after 
establishing  the  values  V ( g )  of  various  games.  But  then  how  are  we  to  establish 
these  values? 

Here  Wallace  has  provided  a  considerably  more  detailed  analysis  than  Deutsch, 
and  from  weaker  premises.  But  the  proofs  are  correspondingly  more  compli- 

®This  does  not  mean  that  subjective  probabilities  are  illusory,  and  correspond  to  nothing  in 
reality.  The  point  is  to  legitimate  the  concept,  not  to  abolish  it.  As  for  its  objective  correlate, 
the  most  popular  candidate  has  long  been  relative  frequency  (of  outcomes  in  a  sequence  of 
trials).  Relative  frequencies  are  obviously  important  when  it  comes  to  evidence  for  proba¬ 
bilities,  but  there  are  well-known  difficulties  with  trying  to  identify  them  with  probabilities 
(for  anything  short  of  infinite  sequences).  We  read  Everett  as  making  a  contrary  proposal: 
that  the  objective  correlates  of  subjective  probability  are  branches  in  the  universal  state  (with 
respect  to  the  decoherence  basis).  Here  we  are  deducing  the  quantitative  rule  to  be  used  in 
assigning  subjective  probabilities  to  branches. 
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cated;  for  the  sake  of  simplicity  we  shall  only  consider  Deutsch’s  argument, 
removing  the  ambiguities  of  notation  in  the  way  shown  by  Wallace. 

First,  consider  Case  1,  the  Stern-Gerlach  experiment.  All  is  in  order  up  to 
Eq.(ll),  but  we  must  do  without  the  assumption  subsequently  made  -  that  the 
registered  outcome  f2(+)  results  with  probability  w,  and  outcome  f2(— )  with 
probability  1  —  w.  Here  Deutsch  invokes  a  new  principle,  what  he  calls  the  zero- 
sum  rule: 


V[<p,X,n\  =  -V[ip,X,-Sl\.  (24) 

Following  Deutsch,  let  us  assume  that  the  numerical  value  of  the  utility  fl(Afc) 
equals  A*,.  Then,  in  the  special  case  where  Ai  =  —  A2  (true  for  the  measurement 
of  a  component  of  spin),  from  Eq.(24),  applied  to  Eq.(ll),  we  deduce: 

V [ciV\  +  c2<£2>  <?z,  fi]  =  - V[cnpx  +  c2ip 2,  ctz,  fi]  (25) 

and  hence  that  V[c\ip1  +  c2(p2)  <?2,  fi]  =  0,  in  accordance  with  the  Born  rule  in 
this  special  case. 

Although  evidently  of  limited  generality,  the  result  is  illustrative  -  assuming 
the  zero-sum  rule  can  be  independently  justified.  (Of  course  it  follows  trivially 
from  Eq.(2),  but  this  was  derived  from  Eq.(l),  and  at  this  point  we  cannot 
make  use  of  the  concept  of  probability.)  Here  is  an  argument:  banking  too 
is  a  form  of  gambling;  the  only  difference  between  acting  as  the  gambler  who 
bets,  and  as  the  banker  who  accepts  the  bet,  is  that  whereas  the  gambler  pays 
a  stake  in  order  to  play,  and  receives  payoffs  according  to  the  outcomes,  the 
banker  receives  the  stake  in  order  to  play,  and  pays  the  payoffs  according  to 
the  outcomes.  The  zero-sum  rule  is  the  statement  that  the  most  that  one  will 
pay  in  the  hope  of  gaining  a  utility  is  the  least  that  one  will  accept  to  take  the 
risk  of  losing  it.  We  may  take  it  that  this  principle,  as  a  principle  of  zero-sum 
games,  is  perfectly  secure.  And  evidently  any  quantum  experiment  can  be  used 
to  play  a  zero-sum  game;  therefore  this  principle  also  applies  to  the  expected 
utility  of  experiments. 

What  of  the  general  equal-norm  case,  Case  2?  Here  the  zero-sum  rule  is  not 
enough.  But  if  we  consider  only  the  case  d  =  2,  it  is  enough  to  supplement 
it  with  another  rule,  what  Deutsch  calls  the  additivity  rule.  A  payoff  function 
fl  :  R  — >  U  is  additive  if  and  only  if  D( x  +  y)  =  fi(x)  +  O(y).  Let  fk  :  R  — )•  R 
be  the  function  fk(x)  =  x  +  k;  then  V  is  additive  if  and  only  if 

V[ip,  X,Cto  fk]  =  Vty,  X,  n]  +  Q(k).  (26) 

Additivity  of  the  payoff  function  is  a  standard  assumption  of  elementary  decision 
theory,  eminently  valid  for  small  bets  (but  hardly  valid  for  large  ones,  or  for 
utilities  that  only  work  in  tandem).  Additivity  of  V  then  has  a  clear  rational: 
it  is  an  example  of  a  sure-thing  principle ,  that  if,  given  two  games,  each  exactly 
the  same,  except  that  in  one  of  them  one  receives  an  additional  utility  f 1(k) 
whatever  the  outcome,  then  one  should  value  that  game  as  having  an  additional 
utility  D(fc). 
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To  see  how  additivity  can  be  used  in  Case  2  (but  restricted  to  d  =  2),  observe 
that  for  k  =  — Ai  —  A2,  the  function  —I  o  fk  is  the  permutation  n.  Therefore 
from  Eq.(14)  we  may  conclude: 

v[ip,x,n]  =  v[ip,x,no-iofk].  (27) 

By  additivity  the  RHS  is  V[if,  X,Qo  — I ]  +  (12  o  — /)(fc),  and  since  12  is  additive 
(so  12  o  —I  =  —12)  we  obtain,  from  the  zero-sum  rule 

V[^,X,fl]  =  -V[ip,X,fl]-n(k).  (28) 

With  a  further  application  of  payoff  additivity  there  follows 

V[iP,X,fl)  =  ^[fl(  Ai)  +  12(A2)]  (29) 

in  accordance  with  the  Born  rule. 

As  Wallace  has  shown,  this,  along  with  the  higher  dimensional  cases  ( d  >  2), 
can  be  derived  from  much  weaker  axioms  of  decision  theory,  that  do  not  assume 
additivity.  Theorem  5  then  goes  through  unchanged.9  As  already  remarked, 
one  is  then  in  a  position  to  derive  the  extension  to  the  irrational  case  without 
assuming  continuity:  for  the  details,  I  refer  to  Wallace  [12]. 

Decision  theory  can  evidently  play  a  role  in  the  derivation  of  the  Born  rule, 
but  it  is  only  needed  if  the  notion  of  probability  is  itself  in  need  of  justification. 
That  may  well  be  so,  in  the  context  of  the  Everett  interpretation;  but  on  other 
approaches  to  quantum  mechanics,  probability,  whatever  it  is,  can  be  taken  as 
given. 


12  Gleason’s  Theorem 

Compare  Gleason’s  theorem: 

Theorem  8  Let  f  be  any  function  from  1-dimensional  projections  on  a  Hilbert 
space  of  dimension  d  >  2  to  the  unit  interval,  such  that  for  each  resolution  of 
the  identity  { Pk },  k  =  1, ...,  d,  Y^t=i  =  I ’>  /(-Pfc)  =  1-  Then  there  exists 

a  unique  density  matrix  p  such  that  f(Pk)  =  Tr(pPk). 

Proof.  Gleason  (1967)  ■ 

A  first  point  is  that  the  derivation  of  the  Born  rule  presented  here  concerns 
the  notion  of  a  fixed  algorithm  that  applies  to  arbitrary  measurement  mod¬ 
els,  hence  to  Hilbert  spaces  of  arbitrary  dimension,  whereas  Gleason’s  theorem 
concerns  an  algorithm  that  applies  to  arbitrary  resolutions  of  the  identity  on  a 
Hilbert  space  of  fixed  dimension.  Although  the  proof  of  Theorems  5  and  7  made 

9  It  is  worth  remarking  that  a  derivation  of  the  Born  rule  for  initial  states  that  are  not 
normalized  to  unity  is  just  what  is  needed  for  the  Everett  interpretation,  as  also  the  de 
Broglie-Bohm  theory  (in  reality,  according  to  either  approach,  one  always  deals  with  branch 
amplitudes  with  modulus  strictly  less  than  one  -  supposing  the  initial  state  of  the  universe 
has  modulus  one). 
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use  of  a  Hilbert  space  of  large  dimensionality,  it  applies  to  the  2-dinrensional 
case  as  well. 

More  important,  on  a  variety  of  approaches  to  quantum  mechanics,  nothing 
so  strong  as  Gleason’s  premise  is  really  motivated.  It  is  not  required  that  prob¬ 
abilities  can  be  defined  for  a  projector  independent  of  the  family  of  projectors 
of  which  it  is  a  member.  This  requirement,  sometimes  called  non-contextuality 
[8],  is  very  strong.  Very  few  approaches  to  quantum  mechanics  subscribe  to  it. 
The  theorem  has  no  relevance  to  any  approach  that  singles  out  a  unique  basis 
once  and  for  all:  it  applies  neither  to  the  GRW  theory  [5] ,  nor  to  the  de  Broglie- 
Bohm  theory  [6],  which  single  out  the  position  basis;  it  does  not  apply  to  the 
Everett  interpretation  [?],  which  singles  out  a  basis  approximately  localized  in 
phase  space;  it  does  not  apply  to  the  consistent  histories  approach  [3],  assum¬ 
ing  the  choice  of  decoherent  history  space  is  unique.  All  these  theories  require 
only  that  probabilities  be  defined  for  projectors  associated  with  the  preferred 
basis  -  if  they  apply  to  any  other  resolution  of  the  identity,  it  is  insofar  as  in 
a  particular  context,  experimental  or  otherwise,  the  latter  projections  become 
correlated  with  members  of  the  former  family. 

But  so  much  is  entirely  compatible  with  the  derivation  that  we  have  offered. 
By  all  means  restrict  Definition  1  to  observables  compatible  with  a  unique  res¬ 
olution  of  the  identity  (and  likewise  the  consistency  condition  of  Definition  2). 
Lemma  3  proves  identities  for  expectation  values  for  commuting  observables,  it 
likewise  can  be  restricted  to  a  unique  resolution  of  the  identity;  likewise  Theo¬ 
rem  4.  In  Theorem  5  an  auxiliary  basis  was  used,  but  again  this  can  again  be 
taken  as  the  preferred  basis.  And  whilst  it  is  In  the  spirit  of  Theorem  7  that 
probabilities  should  also  be  defined  for  small  variations  in  projectors,  this  does 
not  yet  amount  to  the  assumption  of  non-contextuality. 

Unlike  the  premise  of  Gleason’s  theorem,  the  operational  criteria  that  we 
have  used  are  hardly  disputed;  they  are  common  ground  to  all  the  major  schools 
of  foundations  of  quantum  mechanics.  But  it  would  be  wrong  to  suggest  that 
they  apply  to  all  of  them  equally:  on  some  approaches  -  in  particular,  those  that 
provide  a  detailed  dynamical  model  of  measurements  -  there  is  good  reason  to 
suppose  that  an  algorithm  for  expectation  values  will  depend  on  additional 
factors  (in  particular,  on  the  state  at  the  instant  of  state  reduction);  the  Born 
rule  may  no  longer  be  forced  in  consequence.  (But  we  take  it  that  this  would 
be  an  unwelcome  consequence  of  these  approaches;  the  Born  rule  will  have  to 
be  otherwise  justified  -  presumably,  as  it  is  in  the  GRW  theory,  as  a  hypothesis). 

Of  the  major  schools,  two  -  the  Everett  interpretation,  and  those  based 
on  operational  assumptions  (here  we  include  the  Copenhagen  interpretation) 
-  offer  no  such  resources.  This  point  is  clear  enough  in  the  latter  case;  in 
the  case  of  the  Everett  interpretation,  the  association  of  models  with  multiple 
channel  experiments  as  given  in  Definition  1  follows  from  the  full  theory  of 
measurement.10  Quantum  mechanics  under  the  Everett  interpretation  provides 
no  leeway  in  this  matter.  The  same  is  likely  to  be  true  of  any  approach  to 

10  For  arguments  in  the  still  more  general  case,  on  applying  the  Everett  theory  of  measure¬ 
ment  to  any  experiment,  I  refer  to  Wallace  [12]  (see  in  particular  his  principle  of  “measurement 
neutrality” ) . 
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quantum  mechanics  that  preserves  the  unitary  formalism  intact,  without  any 
supplement  to  it. 

The  principal  remaining  schools  have  a  rather  different  status.  One,  the 
state-reduction  approach,  has  already  been  remarked  on:  a  new  and  detailed 
dynamical  theory  of  measurement  is  likely  to  offer  novel  definitions  of  experi¬ 
mental  models  and  novel  criteria  for  when  they  are  to  be  applied.  The  other  is 
the  hidden-variable  approach,  in  which  the  state  evolves  unitarily  even  during 
measurements  (but  is  incomplete).  This  case  deserves  special  consideration. 

13  Completeness 

As  it  happens,  the  one  approach  to  foundations  in  which  the  Born  rule  has  been 
seriously  questioned  is  an  example  of  this  type  (the  de  Broglie-Bohm  theory)  [11]. 
Hidden  variables  certainly  make  a  difference  to  the  argument  we  have  presented. 
Consider  the  proof  of  Theorem  4.  The  passage  from  Eq.(13)  to  Eq.(14)  hinged 
on  the  fact  that  the  state  on  both  sides  of  Eq.(13)  is  identical  when  the  norms  of 
its  components  are  the  same.  (Likewise  the  step  from  Eq.(10)  to  (11).)  But  if 
the  state  is  incomplete,  this  is  not  enough  to  ensure  the  required  identification. 
Including  the  state  of  the  hidden  variables  as  well  (denote  w),  we  should  replace 
ip  by  the  pair  <  ip,  u>  >  (w  may  be  the  value  of  the  hidden  variable,  or  a 
probability  distribution  over  its  values).  Doing  this,  as  Wallace  has  pointed 
out  [12],  there  is  no  guarantee  that  in  the  case  of  superpositions  of  equal  norms 
-  e.g.  for  i p  =  -^(yq  +  ^2);  where  tp1,  (p2  are,  as  in  Case  1,  eigenstates  of  the 

2— component  of  spin  -  that  U K  (permuting  tpl  and  ip2)  will  act  as  the  identity. 
Although  U^ip  =  ip,  its  action  on  <  ip,uj  >  may  well  be  different  from  the 
identity;  how  is  the  permutation  to  act  on  the  hidden  variables? 

The  question  is  clearer  when  IJW  implements  a  spatial  transformation.  We 
have  an  example  where  it  does:  the  Stern-Gerlach  experiment.  In  this  case 
tbrtXzt?”1  =  —  az,  a  reflection  in  the  x  —  y  plane.  Under  the  latter,  a  particle 
initially  with  positive  ^-coordinate  (w  =  +)  is  mapped  to  one  with  negative 
^-coordinate  (w  =  — ).  Under  this  same  transformation,  the  superposition  ip  = 
-^(Pi  +  P2)  is  unchanged.  Therefore  U n  :<  ip,+  >— ><  ip,—  >^<  ip,+  >; 
there  is  no  longer  any  reason  to  suppose  that  Eq.(ll)  will  be  satisfied. 

This  situation  is  entirely  as  expected.  In  the  de  Broglie-Bohm  theory,  given 
such  an  initial  state  ip,  it  is  well  known  that  if  the  incident  particle  is  located 
on  one  side  of  the  plane  of  symmetry  of  the  Stern-Gerlach  apparatus,  then  it 
will  always  remain  there.  It  is  obvious  that  if  the  particles  is  always  located  on 
the  same  side  of  this  plane,  on  repetition  of  the  experiment,  the  statistics  of  the 
outcomes  will  disagree  with  the  Born  rule.  It  is  equally  clear  that  if  particles  are 
randomly  distributed  about  this  plane  of  symmetry  then  the  Born  rule  will  be 
obeyed  -  but  that  is  only  to  say  that  the  probability  distribution  for  the  hidden 
variables  is  determined  by  the  state,  in  accordance  with  the  Born  rule.  This  is 
what  we  are  trying  to  prove. 

But  it  does  not  follow  that  the  arguments  we  have  given  have  no  bearing  on 
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such  a  theory.  Our  strategy,  recall,  was  to  derive  constraints  on  an  algorithm  - 
any  algorithm  -  that  takes  as  inputs  experimental  models  and  yields  as  outputs 
expectation  values.  The  constraints  will  apply  even  if  the  state  is  incomplete, 
even  if  there  are  additional  parameters  controlling  individual  measurement  out¬ 
comes  -  so  long  as  the  state  alone  determines  the  statistical  distribution  of  the 
hidden  variables.  Given  that,  then  any  symmetries  of  the  state  will  also  be 
symmetries  of  the  distribution  of  hidden  variables.  In  application  to  the  de 
Broglie-Bohm  theory,  our  result  indeed  implies  that  the  particle  distribution 
must  be  given  by  the  Born  rule  -  this  is  no  longer  an  additional  postulate  of 
the  theory  -  so  long  as  the  particle  distribution  is  determined  only  by  the  state. 
The  assumption  is  not  that  particles  must  be  distributed  in  accordance  with  the 
Born  rule,  but  that  they  are  distributed  by  any  rule  at  all  that  is  determined 
by  the  state.  Then  it  is  the  Born  rule. 
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